Overview

Dataset statistics

Number of variables10
Number of observations20402133
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.8 GiB
Average record size in memory199.0 B

Variable types

Numeric8
Categorical2

Alerts

Date of Tweet has a high cardinality: 3207253 distinct values High cardinality
Language has a high cardinality: 66 distinct values High cardinality
User ID is highly correlated with Year Account CreatedHigh correlation
Following is highly correlated with Followers and 1 other fieldsHigh correlation
Followers is highly correlated with Following and 1 other fieldsHigh correlation
Total Tweets is highly correlated with Following and 1 other fieldsHigh correlation
Year Account Created is highly correlated with User IDHigh correlation
User ID is highly correlated with Year Account CreatedHigh correlation
Year Account Created is highly correlated with User IDHigh correlation
User ID is highly correlated with Year Account CreatedHigh correlation
Following is highly correlated with FollowersHigh correlation
Followers is highly correlated with Following and 1 other fieldsHigh correlation
Total Tweets is highly correlated with FollowersHigh correlation
Year Account Created is highly correlated with User IDHigh correlation
Unnamed: 0 is highly correlated with Tweet IDHigh correlation
User ID is highly correlated with Year Account CreatedHigh correlation
Tweet ID is highly correlated with Unnamed: 0High correlation
Year Account Created is highly correlated with User IDHigh correlation
Following is highly skewed (γ1 = 43.0372489) Skewed
Followers is highly skewed (γ1 = 44.53262786) Skewed
Total Tweets is highly skewed (γ1 = 44.25165229) Skewed
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
Followers has 506476 (2.5%) zeros Zeros
Retweet Count has 4277424 (21.0%) zeros Zeros

Reproduction

Analysis started2022-04-16 02:27:03.907847
Analysis finished2022-04-16 02:41:03.393673
Duration13 minutes and 59.49 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct20402133
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10201066
Minimum0
Maximum20402132
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size155.7 MiB
2022-04-16T12:41:03.552805image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1020106.6
Q15100533
median10201066
Q315301599
95-th percentile19382025.4
Maximum20402132
Range20402132
Interquartile range (IQR)10201066

Descriptive statistics

Standard deviation5889588.634
Coefficient of variation (CV)0.5773503116
Kurtosis-1.2
Mean10201066
Median Absolute Deviation (MAD)5100533
Skewness-4.302153923 × 10-17
Sum2.081235053 × 1014
Variance3.468725428 × 1013
MonotonicityStrictly increasing
2022-04-16T12:41:03.666046image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
136014211
 
< 0.1%
136014281
 
< 0.1%
136014271
 
< 0.1%
136014261
 
< 0.1%
136014251
 
< 0.1%
136014241
 
< 0.1%
136014231
 
< 0.1%
136014221
 
< 0.1%
136014201
 
< 0.1%
Other values (20402123)20402123
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
204021321
< 0.1%
204021311
< 0.1%
204021301
< 0.1%
204021291
< 0.1%
204021281
< 0.1%
204021271
< 0.1%
204021261
< 0.1%
204021251
< 0.1%
204021241
< 0.1%
204021231
< 0.1%

User ID
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3478339
Distinct (%)17.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.341856004 × 1017
Minimum76
Maximum1.51402441 × 1018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size155.7 MiB
2022-04-16T12:41:03.806631image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum76
5-th percentile35672554
Q1491304400
median7.447845409 × 1017
Q31.305252611 × 1018
95-th percentile1.497324963 × 1018
Maximum1.51402441 × 1018
Range1.51402441 × 1018
Interquartile range (IQR)1.30525261 × 1018

Descriptive statistics

Standard deviation6.390079389 × 1017
Coefficient of variation (CV)1.007603986
Kurtosis-1.790794388
Mean6.341856004 × 1017
Median Absolute Deviation (MAD)7.44784538 × 1017
Skewness0.1376486544
Sum8.204694187 × 1018
Variance4.08331146 × 1035
MonotonicityNot monotonic
2022-04-16T12:41:03.914503image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.499763124 × 101818859
 
0.1%
245306924516702
 
0.1%
1.203552378 × 10189969
 
< 0.1%
310779308685
 
< 0.1%
2476193428544
 
< 0.1%
881963147481
 
< 0.1%
42309380576857
 
< 0.1%
1.066275283 × 10186263
 
< 0.1%
1.260888403 × 10185429
 
< 0.1%
1.216550422 × 10185375
 
< 0.1%
Other values (3478329)20307969
99.5%
ValueCountFrequency (%)
761
 
< 0.1%
2213
< 0.1%
2241
 
< 0.1%
3241
 
< 0.1%
4182
 
< 0.1%
4222
 
< 0.1%
5096
< 0.1%
5213
< 0.1%
5562
 
< 0.1%
6142
 
< 0.1%
ValueCountFrequency (%)
1.51402441 × 10184
< 0.1%
1.514011053 × 10181
 
< 0.1%
1.514010489 × 10181
 
< 0.1%
1.514007089 × 10183
< 0.1%
1.514005262 × 10181
 
< 0.1%
1.514005057 × 10181
 
< 0.1%
1.514001594 × 10181
 
< 0.1%
1.514001536 × 10181
 
< 0.1%
1.514001 × 10181
 
< 0.1%
1.514000789 × 10181
 
< 0.1%

Following
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct60618
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1734.896307
Minimum0
Maximum2312359
Zeros161113
Zeros (%)0.8%
Negative0
Negative (%)0.0%
Memory size155.7 MiB
2022-04-16T12:41:04.018677image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile15
Q1151
median528
Q31662
95-th percentile5005
Maximum2312359
Range2312359
Interquartile range (IQR)1511

Descriptive statistics

Standard deviation6355.42819
Coefficient of variation (CV)3.66328994
Kurtosis5612.044669
Mean1734.896307
Median Absolute Deviation (MAD)460
Skewness43.0372489
Sum3.53955852 × 1010
Variance40391467.48
MonotonicityNot monotonic
2022-04-16T12:41:04.112405image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0161113
 
0.8%
5001131737
 
0.6%
1129055
 
0.6%
277887
 
0.4%
373059
 
0.4%
500073043
 
0.4%
656754
 
0.3%
500256454
 
0.3%
554333
 
0.3%
1354183
 
0.3%
Other values (60608)19534515
95.7%
ValueCountFrequency (%)
0161113
0.8%
1129055
0.6%
277887
0.4%
373059
0.4%
453920
 
0.3%
554333
 
0.3%
656754
 
0.3%
753230
 
0.3%
846085
 
0.2%
944533
 
0.2%
ValueCountFrequency (%)
23123591
< 0.1%
23120951
< 0.1%
14242631
< 0.1%
14242441
< 0.1%
14242351
< 0.1%
14242341
< 0.1%
14242271
< 0.1%
14242101
< 0.1%
14241991
< 0.1%
14241571
< 0.1%

Followers
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct214828
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12836.24223
Minimum0
Maximum52767632
Zeros506476
Zeros (%)2.5%
Negative0
Negative (%)0.0%
Memory size155.7 MiB
2022-04-16T12:41:04.221714image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q161
median321
Q31361
95-th percentile10617
Maximum52767632
Range52767632
Interquartile range (IQR)1300

Descriptive statistics

Standard deviation280304.5629
Coefficient of variation (CV)21.83696426
Kurtosis2387.985147
Mean12836.24223
Median Absolute Deviation (MAD)307
Skewness44.53262786
Sum2.618867213 × 1011
Variance7.857064799 × 1010
MonotonicityNot monotonic
2022-04-16T12:41:04.315438image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0506476
 
2.5%
1285616
 
1.4%
2228759
 
1.1%
3190880
 
0.9%
4167885
 
0.8%
5153599
 
0.8%
6141173
 
0.7%
7130745
 
0.6%
8119371
 
0.6%
9110652
 
0.5%
Other values (214818)18366977
90.0%
ValueCountFrequency (%)
0506476
2.5%
1285616
1.4%
2228759
1.1%
3190880
 
0.9%
4167885
 
0.8%
5153599
 
0.8%
6141173
 
0.7%
7130745
 
0.6%
8119371
 
0.6%
9110652
 
0.5%
ValueCountFrequency (%)
527676321
< 0.1%
520133131
< 0.1%
471892621
< 0.1%
312861871
< 0.1%
312844431
< 0.1%
264551151
< 0.1%
263774921
< 0.1%
263541721
< 0.1%
263480131
< 0.1%
249340271
< 0.1%

Total Tweets
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct645207
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54764.88325
Minimum0
Maximum47010526
Zeros151
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size155.7 MiB
2022-04-16T12:41:04.424819image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile104
Q12161
median11809
Q349365
95-th percentile247856
Maximum47010526
Range47010526
Interquartile range (IQR)47204

Descriptive statistics

Standard deviation142835.4966
Coefficient of variation (CV)2.608158516
Kurtosis12149.91797
Mean54764.88325
Median Absolute Deviation (MAD)11322
Skewness44.25165229
Sum1.117320432 × 1012
Variance2.040197909 × 1010
MonotonicityNot monotonic
2022-04-16T12:41:05.002743image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
126440
 
0.1%
224040
 
0.1%
321993
 
0.1%
420547
 
0.1%
519569
 
0.1%
619199
 
0.1%
717891
 
0.1%
817441
 
0.1%
2517207
 
0.1%
916974
 
0.1%
Other values (645197)20200832
99.0%
ValueCountFrequency (%)
0151
 
< 0.1%
126440
0.1%
224040
0.1%
321993
0.1%
420547
0.1%
519569
0.1%
619199
0.1%
717891
0.1%
817441
0.1%
916974
0.1%
ValueCountFrequency (%)
470105262
< 0.1%
470102022
< 0.1%
470015941
< 0.1%
470015531
< 0.1%
470009361
< 0.1%
470009061
< 0.1%
470000101
< 0.1%
469999691
< 0.1%
469996451
< 0.1%
469989971
< 0.1%

Tweet ID
Real number (ℝ≥0)

HIGH CORRELATION

Distinct20314584
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.505113445 × 1018
Minimum1.496738675 × 1018
Maximum1.514030604 × 1018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size155.7 MiB
2022-04-16T12:41:05.190219image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1.496738675 × 1018
5-th percentile1.497619265 × 1018
Q11.50082366 × 1018
median1.504844726 × 1018
Q31.509408762 × 1018
95-th percentile1.513086567 × 1018
Maximum1.514030604 × 1018
Range1.729192897 × 1016
Interquartile range (IQR)8.585102103 × 1015

Descriptive statistics

Standard deviation4.939617604 × 1015
Coefficient of variation (CV)0.003281890559
Kurtosis-1.176191804
Mean1.505113445 × 1018
Median Absolute Deviation (MAD)4.22642174 × 1015
Skewness0.104753572
Sum4.581629262 × 1018
Variance2.439982208 × 1031
MonotonicityNot monotonic
2022-04-16T12:41:05.283941image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.497764193 × 10182
 
< 0.1%
1.497764237 × 10182
 
< 0.1%
1.497764222 × 10182
 
< 0.1%
1.497764222 × 10182
 
< 0.1%
1.497764222 × 10182
 
< 0.1%
1.497764222 × 10182
 
< 0.1%
1.497764222 × 10182
 
< 0.1%
1.497764222 × 10182
 
< 0.1%
1.497764223 × 10182
 
< 0.1%
1.497764223 × 10182
 
< 0.1%
Other values (20314574)20402113
> 99.9%
ValueCountFrequency (%)
1.496738675 × 10181
< 0.1%
1.496738675 × 10181
< 0.1%
1.496738676 × 10181
< 0.1%
1.496738676 × 10181
< 0.1%
1.496738676 × 10181
< 0.1%
1.496738676 × 10181
< 0.1%
1.496738676 × 10181
< 0.1%
1.496738677 × 10181
< 0.1%
1.496738677 × 10181
< 0.1%
1.496738677 × 10181
< 0.1%
ValueCountFrequency (%)
1.514030604 × 10181
< 0.1%
1.514030603 × 10181
< 0.1%
1.514030603 × 10181
< 0.1%
1.514030603 × 10181
< 0.1%
1.514030602 × 10181
< 0.1%
1.514030602 × 10181
< 0.1%
1.5140306 × 10181
< 0.1%
1.5140306 × 10181
< 0.1%
1.514030599 × 10181
< 0.1%
1.514030598 × 10181
< 0.1%

Date of Tweet
Categorical

HIGH CARDINALITY

Distinct3207253
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Memory size1.4 GiB
2022-03-28 03:12:22
 
417
2022-03-28 03:12:21
 
398
2022-03-28 03:12:24
 
394
2022-04-04 03:08:11
 
394
2022-04-04 02:35:07
 
393
Other values (3207248)
20400137 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique288044 ?
Unique (%)1.4%

Sample

1st row2022-04-01 00:00:00
2nd row2022-04-01 00:00:00
3rd row2022-04-01 00:00:00
4th row2022-04-01 00:00:00
5th row2022-04-01 00:00:00

Common Values

ValueCountFrequency (%)
2022-03-28 03:12:22417
 
< 0.1%
2022-03-28 03:12:21398
 
< 0.1%
2022-03-28 03:12:24394
 
< 0.1%
2022-04-04 03:08:11394
 
< 0.1%
2022-04-04 02:35:07393
 
< 0.1%
2022-03-28 03:17:45391
 
< 0.1%
2022-04-04 02:35:05388
 
< 0.1%
2022-04-04 03:20:13387
 
< 0.1%
2022-04-04 03:20:16387
 
< 0.1%
2022-03-28 03:17:44384
 
< 0.1%
Other values (3207243)20398200
> 99.9%

Length

2022-04-16T12:41:05.471382image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022-03-07567745
 
1.4%
2022-03-06566767
 
1.4%
2022-03-05546780
 
1.3%
2022-03-08519385
 
1.3%
2022-03-09493857
 
1.2%
2022-03-15484221
 
1.2%
2022-03-21480519
 
1.2%
2022-03-04480290
 
1.2%
2022-03-17468632
 
1.1%
2022-03-18468098
 
1.1%
Other values (86438)35727972
87.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Retweet Count
Real number (ℝ≥0)

ZEROS

Distinct63601
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1198.569732
Minimum0
Maximum2952269
Zeros4277424
Zeros (%)21.0%
Negative0
Negative (%)0.0%
Memory size155.7 MiB
2022-04-16T12:41:05.549490image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median29
Q3309
95-th percentile4684
Maximum2952269
Range2952269
Interquartile range (IQR)308

Descriptive statistics

Standard deviation6461.94484
Coefficient of variation (CV)5.391379963
Kurtosis2442.363763
Mean1198.569732
Median Absolute Deviation (MAD)29
Skewness18.38342198
Sum2.445337909 × 1010
Variance41756731.11
MonotonicityNot monotonic
2022-04-16T12:41:05.658842image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04277424
 
21.0%
11265031
 
6.2%
2693333
 
3.4%
3483402
 
2.4%
4373359
 
1.8%
5304281
 
1.5%
6258516
 
1.3%
7225900
 
1.1%
8201466
 
1.0%
9180061
 
0.9%
Other values (63591)12139360
59.5%
ValueCountFrequency (%)
04277424
21.0%
11265031
 
6.2%
2693333
 
3.4%
3483402
 
2.4%
4373359
 
1.8%
5304281
 
1.5%
6258516
 
1.3%
7225900
 
1.1%
8201466
 
1.0%
9180061
 
0.9%
ValueCountFrequency (%)
29522691
 
< 0.1%
4367822
 
< 0.1%
4367812
 
< 0.1%
4367785
< 0.1%
4367741
 
< 0.1%
4367721
 
< 0.1%
4367691
 
< 0.1%
4367684
< 0.1%
4367675
< 0.1%
4367644
< 0.1%

Language
Categorical

HIGH CARDINALITY

Distinct66
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 GiB
en
13793384 
fr
 
984332
de
 
926169
it
 
854292
und
 
832856
Other values (61)
3011100 

Length

Max length3
Median length2
Mean length2.040849307
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en13793384
67.6%
fr984332
 
4.8%
de926169
 
4.5%
it854292
 
4.2%
und832856
 
4.1%
es738594
 
3.6%
th287577
 
1.4%
uk245309
 
1.2%
pl197337
 
1.0%
tr196654
 
1.0%
Other values (56)1345629
 
6.6%

Length

2022-04-16T12:41:05.752559image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
en13793384
67.6%
fr984332
 
4.8%
de926169
 
4.5%
it854292
 
4.2%
und832856
 
4.1%
es738594
 
3.6%
th287577
 
1.4%
uk245309
 
1.2%
pl197337
 
1.0%
tr196654
 
1.0%
Other values (56)1345629
 
6.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Year Account Created
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.789509
Minimum1970
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size155.7 MiB
2022-04-16T12:41:05.830667image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1970
5-th percentile2009
Q12012
median2016
Q32020
95-th percentile2022
Maximum2022
Range52
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.500545935
Coefficient of variation (CV)0.002232646769
Kurtosis-1.381911647
Mean2015.789509
Median Absolute Deviation (MAD)4
Skewness-0.1234495251
Sum4.112640566 × 1010
Variance20.25491371
MonotonicityNot monotonic
2022-04-16T12:41:05.908768image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
20212478516
12.1%
20222051629
10.1%
20201845686
9.0%
20091720640
 
8.4%
20111573615
 
7.7%
20121386073
 
6.8%
20141232417
 
6.0%
20101209564
 
5.9%
20191195174
 
5.9%
20131194260
 
5.9%
Other values (8)4514559
22.1%
ValueCountFrequency (%)
197014
 
< 0.1%
20063161
 
< 0.1%
200775373
 
0.4%
2008319643
 
1.6%
20091720640
8.4%
20101209564
5.9%
20111573615
7.7%
20121386073
6.8%
20131194260
5.9%
20141232417
6.0%
ValueCountFrequency (%)
20222051629
10.1%
20212478516
12.1%
20201845686
9.0%
20191195174
5.9%
2018987842
 
4.8%
20171108220
5.4%
20161000913
4.9%
20151019393
5.0%
20141232417
6.0%
20131194260
5.9%

Interactions

2022-04-16T12:39:43.106308image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:35:20.649602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:00.100331image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:39.540332image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:16.122266image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:52.642745image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:29.153536image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:06.396257image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:48.245459image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:35:26.039180image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:05.169601image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:44.273345image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:20.877294image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:57.357155image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:34.058356image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:11.098044image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:53.103406image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:35:30.972162image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:10.045959image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:48.662692image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:25.387042image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:01.869134image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:38.648869image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:15.544171image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:57.942813image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:35:35.849200image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:14.924816image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:53.195290image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:29.792001image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:06.336602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:43.235721image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:20.016096image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:40:02.803865image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:35:40.753321image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:19.823914image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:57.647100image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:34.287695image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:10.725958image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:47.813080image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:24.500408image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:40:07.815912image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:35:45.488267image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:24.911062image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:02.278002image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:38.848874image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:15.300801image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:52.436783image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:29.048347image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:40:12.689546image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:35:50.174397image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:29.784623image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:06.834951image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:43.326609image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:19.805228image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:56.995022image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:33.449366image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:40:17.761206image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:35:55.045067image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:36:34.925278image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:11.521052image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:37:48.012718image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:38:24.482990image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:01.850702image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-16T12:39:38.102326image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-04-16T12:41:06.002458image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-04-16T12:41:06.111801image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-04-16T12:41:06.221145image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-04-16T12:41:06.330489image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-04-16T12:40:22.086146image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-04-16T12:40:28.629752image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Unnamed: 0User IDFollowingFollowersTotal TweetsTweet IDDate of TweetRetweet CountLanguageYear Account Created
001688277411583928836615096819500421980302022-04-01 00:00:003412en2008
1132052960691228819985315096819501513482292022-04-01 00:00:00100en2015
22123594086981280972823172548115096819506839265562022-04-01 00:00:009en2020
33134798537556696678439937730115096819511160463362022-04-01 00:00:00573en2021
44150539481663684608315825898215096819513049907202022-04-01 00:00:00190en2022
557996525087717662747662024460115096819520009379992022-04-01 00:00:001en2016
66128064877334606643213435496796615096819529782108492022-04-01 00:00:005en2020
771767363570203326639474615096819530538434662022-04-01 00:00:002en2008
884667139620422219529332015096819530914570352022-04-01 00:00:003en2009
9912754756066841722901676102665115096819534187110502022-04-01 00:00:000en2020

Last rows

Unnamed: 0User IDFollowingFollowersTotal TweetsTweet IDDate of TweetRetweet CountLanguageYear Account Created
20402123204021231502125154461233152793758240915096777195498455062022-03-31 23:43:113es2022
2040212420402124787156255089360896243612956952615096777203889315892022-03-31 23:43:112845en2016
20402125204021258630004595768811525001410141859415096777205144330282022-03-31 23:43:1116en2017
2040212620402126147847589868913459232012184915096777216595271682022-03-31 23:43:1110und2022
20402127204021271183429966697910272414150685415096777225906053132022-03-31 23:43:123en2019
20402128204021281502237057195945984482348658315096777231360204802022-03-31 23:43:1253en2022
20402129204021298231396041150013444168453152115096777244906414102022-03-31 23:43:1235en2017
204021302040213015020281009676247043941323415096777248807034952022-03-31 23:43:1267ar2022
20402131204021311126857882308304896309199345615096777249145692202022-03-31 23:43:120ja2019
20402132204021321465798903152922625865452259515096777272883732552022-03-31 23:43:137en2021